BINARY TREE STRUCTURES FOR DATA BASE INDEXING OF DBF FILES The set of data structures, procedures, functions and associated code provided will allow the developer to maintain binary tree sorted list files of DBF files (DBASE III+ compatible) based on a Field Array of up to 10 DBase Field Descriptors for each index structure needed. These are true binary trees that were taught to you in college DB Architecture class: Root / \ / \ Son Daughter / \ / \ / \ / \ NULL----->... ... ... Daughter / \ ... ... And yes, this structure carries with it the same liability it always has; the necessity of wasting space for NULL ADDRESS pointers which I don't con- sider all that bad given the capabilities of the structure for very large data bases where it's true efficiency (when balanced) and simplicity still rivals more modern, complex indexing schemes. Within these HDX index files is the capability, through code and data structure development to maintain and control a MAINTENANCE STACK of unused address space within the file using a somewhat hybrid linked list data struc- ture whose size in bytes matches the size in bytes of the node structure for the tree. All this stack creation and maintenance is carried out in order to max- imize and utilize file space released through record deletions and still maintain index integrity without having to rebuild the files after associated DBF files have been packed (which procedure does actually rebuild the file image by copying active records to a new file and skipping over those that are marked for deletion) thereby requiring a consolidation of the associated index files. This real-time data integrity of these index files also lends itself quite well to the development of a NETWORKABLE Database and real-time multi-user index structure (more on this later). In order to conserve space, all HDX index files have HEADERS, along with the abovementioned node and stack elements and individual index headers (see below). These headers store the image of each root node for each index (yes, more than one index can reside within one HDX file) and each subsequent LChld (SON) or RChld (DAUGHTER) Node then holds it's own address as well as that of it's PARENT, Son & Daughter and the RECORD NUMBER where the index entry INFO is stored in the DBF file. For purposes of clarity, which every programmer will tell you is the true measure of how good a person you are, let me say that it goes without saying that ADDRESS herein refers to the FILE OFFSET within the HDX file of these structures. Stack elements will be maintained in the HDX file in the same file header. There the first record of this linked list is stored and all unused node/stack addresses within the HDX file are maintained through this list for use in sub- sequent insert operations. Comparisons within this structure are done "on the fly" and no overhead for comparison text/info is made within the HDX file. One of my frustrations with Dbase NDX files was the amount of memory used to replicate information already found within the database. This duplication of information seemed wasteful, yet apparently the prime reason for this method is to speed up any searches by limiting disk accesses to a minimum. This really isn't a problem anymore and file space is; so, for now that's about as far as I'll go in explaining the rationale for my file scheme. Except to say that I had a net- work, multi-user environment in mind when I designed it and this way seemed the best for DB development where record locking/unlocking was my responsi- bility and real-time index integrity accross a multi-user platform could be reasonably assured. The resulting HDX file, but for the header, is basically a "mish-mash" of information with no apparent structure when viewed under the harsh eyes of a diagnostic rationale and can be very volatile were a sudden power surge or network failure to occur. So as difficult as it already is to lay down the basics of my design, I will stick to a single user environment which can still utilize this scheme and avoid talk of "networkability" as much as pos- sible. Reindexing under this structure is relegated to HDX file recreations after data integrity has been lost due to the aforementioned and any other computer disasters. Binary tree algorithm search speed will compensate for reads and comparisons done "on the fly". Lastly, since computer disasters are inevitable and there exists the very real possibility of a "linear tendency" within a binary tree structure which will wreak havoc on any recursive procedure (did I mention I used recursion?), I have included a random number range generator which will create a "random" sequence of record numbers during insertion to insure balanced tree formation when reindexing a DBF file (near as I can tell, this generator fills up to 90-95% of numbers in the range on one pass). Any chance that records were entered in the order specified by an index necessitates this method of tree formation for existing DBF files to maximize speed and memory and eliminate potential problems. I won't go into much esoteric or academic detail regar- ding binary trees and their implementation; so, if you are not familiar with them I strongly suggest you research the topic before I thoroughly muddy the waters with my implementation. All code, data structures and algorithms were designed in and for use in the Visual Basic for Dos programming environment and for BINARY file mode implementation. Don't you just hate the legal niceties of life? _____________________________________________________________________________ This specification was developed by J. Lerma Jr. \dba\ InterActive Solutions. Although it has been released into the public domain and is not confidential or proprietary, the specification is still the copyright and property of J. Lerma Jr. & InterActive Solutions. Disclaimer of Warranty J. LERMA JR. & INTERACTIVE SOLUTIONS EXCLUDE ANY AND ALL IMPLIED WARRANTIES, INCLUDING WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. NEITHER J. LERMA JR. OR INTERACTIVE SOLUTIONS MAKE ANY WARRANTY OF REPRESENTA- TION, EITHER EXPRESS OR IMPLIED, WITH RESPECT TO THIS SPECIFICATION, ITS QUA- LITY, PERFORMANCE, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. NEITHER J. LERMA JR. OR INTERACTIVE SOLUTIONS SHALL HAVE ANY LIABILITY FOR SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF OR RESULTING FROM THE USE OR MODIFICATION OF THIS SPECIFICATION. _____________________________________________________________________________ Whew, glad that's over. All information on DBF file structures and code for their implementation is credited to Ethan Winer and his book: "BASIC Techniques & Utilities" PC Magazine / ZD Press Emeryville, California Copyright 1991 If you don't already have this book, I recommend it highly. On with the show. HDX File Header------------------------------------------------------------- Lngth in Address Bytes Desc ---------------------------------------------------------------------------- 1 2 Header Flag, Reserved 3 2 Word for Integer Index Header Count 5 396 99 Dwords, addresses for Index Headers(*) 401 568 142 Dwords, Reserved 969 4 Dword, Reserved for HDX signature: Long Int 969 973 20 First Stk Element 993 4 Dword, Reserved 997 4 Dword, Long Int Length of File ---------------------------------------------------------------------------- File Offset: 1000 Bytes for HDX Header (*) 100 Bytes for Index Header -- NDXVOL (see below) 20 Bytes for Node Record -- NDXNODE (see below) 20 Bytes for Stack Record -- STKTYPE (see below) The file header takes up the first 1000 bytes of the HDX file and is where all index requests begin. For my purposes, I have limited the individual index count to 99 within an HDX file (changing this would alter the header size, the signature Dword position and in general wreak havoc with this scheme). The last four bytes of the header contain the HDX file length which is updated after each insert request. Subsequent data structures to be found in the file beginning at position 1001 include: TYPE NdxVol TYPE NdxNode TYPE StkType Flag AS INTEGER RecNo AS LONG Addr1 AS LONG TName AS STRING * 8 Prnt AS LONG Addr2 AS LONG DName AS STRING * 8 LChld AS LONG Addr3 AS LONG HdrLng AS INTEGER RChld AS LONG Nxt AS LONG RecLng AS INTEGER Addr AS LONG Addr AS LONG Unq AS INTEGER END TYPE END TYPE TRecs AS LONG Root AS NdxNode FldCnt AS INTEGER KeyFld(0 TO 9) AS FCBlk (See below) END TYPE We begin with NdxVol. ------------------------ The upto 99 Index headers can occur anywhere inside the HDX file, allowing for new index creation for an HDX file already in use. This necessitates the allocation of space in the header to store their addresses within the file as well as a count of Indexes in same. Flag - for network use only. TName - an 8 character unique name given to the index for identification purposes. DName - an 8 character name taken from the DBF file (i.e. Customer.dbf would require a DName of "Customer" for the Index Header). HdrLng - DBF file Header length (See below) RecLng - DBF file Record length (See below) Unq - Integer flag for Index's "unique" elements status: True\False TRecs - Total number of records in DBF file (See below) Root - See NdxNode FldCnt - Count of number of Key fields to index by KeyFld - Data structure holding Key Field information (See DBF structures) for the 10 Key fields: TYPE FCBlk FNum AS INTEGER - Offset of field in record FType AS STRING * 1 - Field type: "N","C","D","L","M" FLen AS INTEGER - Length of field in record END TYPE NdxNode. ------------ This 20 byte data structure is the heart of the HDX file. 5 Long Integers provide orientation, position, direction and location for itself and the record number for it's info in the DBF file. This is necessary since nothing can be accomplished without knowing exactly where you are inside an HDX file at any given time. Recno - Position of record within it's DBF file. The obvious problem here is that a PACKING procedure on a DBF file would necessitate realignment of all records in an index under this structure. An alternative would be a seperate maintenance facility of records in a DBF file to use empty record positions after copying deleted record info to a safe area and avoid packing altogether. Prnt - Points to the index entry's PARENT position within the HDX file. The Parent is an NdxNode which was previously inserted and based on a comparison criteria, accepts the current NdxNode's entry and assigns or passes it's position further down the tree. -------------------------------------------------------------------------- In a BALANCED tree, all but the last generation of nodes utilize these variables for something other than NULL storage; but, there is no way around the fact that a significant amount of Dwords in the HDX file are used simply for maintaining NULL pointers in a PURE binary tree scheme. LChld - Points to an NDXNode's Less-Than/Greater-Than SON depending on comparison criteria. RChld - Points to an NDXNode's Less-Than/Greater-Than DAUGHTER depending on comparison criteria. -------------------------------------------------------------------------- Addr - All NdxNode's have their file offset stored within themselves. Although this sounds redundant it makes traversal, maintenance, and all other operations easier and faster. StkType. ------------ Sizing this data structure to match the size of an NdxNode was crucial to its design. Still in all, using a Stack Element which holds upto 3 empty positions in an HDX file before giving itself up for use was not all that difficult a task to implement. Addr1,2,3 - Holds addresses of empty positions in HDX file. (Once again NULL pointers will waste space, but no more so than had the address held a normal NdxNode) Nxt - Holds address of next Stack node in the linked list and is NULL until the Node is full and another position has been freed. This being the case, the new position is converted to a Stack Element, it's address provided for the previous last entry in stack and placed here, and the new position's Addr1,2,3 are NULLED. Addr - Stack element's offset inside the HDX file. ------------------------------------------------------------------------------ Dbase DBF Structures ------------------------------------------------------------------------------ For interpretation of these data structures, I refer you the the afore- mentioned publication by Ethan Winer, my code, and any other reference mate- rial you can find. The structures and code have been modified slightly from Mr. Winer's description to accomodate my indexing implementation and are better off left as is (same applies to the code) unless you are well acquain- ted with DBASE file structures. TYPE DBFHeadStruc TYPE FieldStruc Version AS INTEGER FName AS STRING * 10 Memo AS INTEGER FType AS STRING * 1 Ano AS INTEGER FOff AS INTEGER Mes AS INTEGER FLen AS INTEGER Dia AS INTEGER Dec AS INTEGER FirstRec AS INTEGER END TYPE TRecs AS LONG RecLen AS INTEGER TFields AS INTEGER END TYPE ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ This structure utilizes the DBF data structures above to define the DBF file being worked with an associated index: TYPE DBFileType FName AS STRING * 8 - Same as NdxHeader's DName. Status AS INTEGER - In single user scheme; holds file handle. FileNo AS INTEGER - In multi-user scheme; holds file handle. Fields(100) AS FieldStruc - Limits the number of fields, but is necessary. Header AS DBFHeadStruc - See above. END TYPE ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ I have included the following files for you to ponder over: NDXPACK.BI - Include file for the whole "shibang" DBFNDX.MAK - Project file for the code DBASE.BAS - Dbase file operations NDXPACK.BAS - Hdx & file & Index operations NEXUS.BAS - An interface for working with the recursive Traverse routines. FORM1.FRM - A demo form to accept records from the Traverse routines. EXAM1,2.BAS - Sample code for applying the modules above once they have become converted into a QUICK LIBRARY. As for the rest, I leave you to my code which is copiously filled with remarks for you to sneer and laugh at. Seriously though, I would appreciate any suggestions or ideas you may have to better this scheme. All of this is condensed into the working demo zip file NDXDBF.EXE found in the same forum that this came from, and from which I got a mediocre res- ponse. I suppose since the ISAM/Access database development kits & tools offer a quicker and easier method than this madness, it was to be expected, yet as near as I can tell, the ability to provide network access and control to the developer for SINGLE RECORDS under these very capable development tools is either not available or eludes me. If you should find all this code and information useful and/or to your liking, please show your appreciation by sending $25.00 to: J. Lerma Jr. 2413 McClelland Laredo, Tx. 78040 Include another $5.00 and I will send you my source code for implementing this scheme to work for network/multi-user VBMSDOS applications. Please in- clude disk preference with your renumeration. So, you say, why should I spend my hard-earned money on your ideas? Fair enough. Here come's the hard sell. I have developed a scheme for using this implementation in a network environment with the following capabilities: 1. Single record control for locking/unlocking 2. Ability to use Soft/Hard Record & File Locks 3. Multi-user index file capability 4. Reprocess settings for poll busy instances of records 5. Error Messages for index/record polling failures & file corruption 6. Built in index update capability for poll out busy instances on index files which a system operator can retrieve to update the files before they become corrupt 7. Hdx & Dbf File flag clearing routines for system crashes/failures. & MORE! This implementation is designed to do without record managers, drivers, and devices, providing all the tools necessary, through the HDX & DBF file structures and the modules, in a small enough package to place directly into your application and still leave room for your code. Ease and flexibility makes this network version generic enough to adapt from application to appli- cation. Try it you'll like it! To reiterate the capabilities of this scheme: Work with DBF Files with up to 100 fields! OpenDbf let's you pass File Handles to the modules which you control! Search for fields by name or number! Returns a field's DBASE data type! Returns date values in proper BASIC format! (mm-dd-yyyy) Saves date values in proper DBASE format! (yyyymmdd) Checks for NULL Date fields! Double precision capability for numeric fields! Stores Numeric fields according to DBASE Field DEC format! **************************************************************************** **************************************************************************** The INDEX CONSTRUCTOR and accompanying interface will allow you to create index files with BTREE indexes on up to 10 KEY field values which you can TRAVERSE in ascending or descending order with a file overhead of 20 bytes per record! Store several indexes for each DBF file in one index file or group indexes for several DBF files in one Index file! With multiple index capability per file, up to 99 indexes on as many DBF files can be active with only one file open! Built in maintenance capability keeps track of unused file space to prevent bloating of index files! Update your indexes rapidly with one module call and an average search time of approximately 20 seeks for 100,000 records (tested)! Index overhead is fixed at maximum of 10.9K bytes for a full index file and 20 bytes per record regardless of DBF file or record/field length! Constructor interface assumes DBF Records being in index order and assures balanced tree formation for existing DBF files! Traverse routine checks for BTree linear tendency and returns caution through an error code to help prevent stack space errors!